ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / icon / newsgrp / group98a.txt / 000130_icon-group-sender _Fri Mar 13 12:35:12 1998.msg < prev next >

Wrap

Internet Message Format | 2000-09-20 | 4KB

Return-Path: <icon-group-sender> Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239]) by baskerville.CS.Arizona.EDU (8.8.7/8.8.7) with SMTP id MAA14399 for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Fri, 13 Mar 1998 12:35:12 -0700 (MST) Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM) id AA17700; Fri, 13 Mar 1998 12:35:12 -0700 From: gep2@computek.net Date: Fri, 13 Mar 1998 11:30:44 -0600 Message-Id: <199803131730.LAA18482@axp.cmpu.net> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Subject: Letter Probabilities To: icon-group@optima.CS.Arizona.EDU X-Mailer: SPRY Mail Version: 04.00.06.17 Errors-To: icon-group-errors@optima.CS.Arizona.EDU Status: RO Content-Length: 2730 > I have a table that associates inidividual letters (one-char strings) with real numbers (probabilities). We can assume for the sake of argument that the sum of all probabilities in my table is unity. > Given this table (which I already have Icon code to obtain) what is the most efficient method of generating random text? What I am thinking of at the moment is: > (1) get a sorted list of [key,value] pairs, sorted by value (probability), highest probability first > (2) generate a random number from 0.0 to 1.0 > (3) use a while-loop to find the slot in the sorted list where the number falls; I would subtract each passing probability until my placeholder value had vanished; e.g. > i := 0 # running index x := ?0 # random number 0.0 - 1.0 while x > 0 do { i +:= 1 x -:= prob_list[i][2] } letter := prob_list[i][1] You're trying to program it like you were programming in C, and that C-style "clockwork mentality" is why you're having problems, IMHO. What I think you ought to do is to simply take your table and build a string (just once!) which contains a number of each letter commensurate with its probability. Then you can replace all this silly C-style nonsense with just: ?letterstring ...and they will "automagically" come out (as many as you need) with the probability you require. In fact, you don't even need to start with your "probabilities" table at all... you can just take your demonstration text, put it all in a string and remove any characters you don't want there (quotation marks and other punctuation for example) and you're all set. > This all seems rather awkward to me, especially step (3). Yup, exactly, that's because you're using C-type programming mentality instead of embracing an Icon-native approach to the problem. > Isn't there some construct in Icon that could do this more elegantly? You BETCHA there is. See above. :-) > P.S. I am already very well acquainted with the sample program 'monkeys.icn' in the distribution. This program uses multiple-character sequences, not individual letter probabilities. More characters gives a better approximation to the source language, but I am interested specifically in single-character probabilities right at the moment. You also ought to buy a copy of the fascinating book "Algorithms in SNOBOL4" by Gimpel (and sold in facsimile reprint by Catspaw). There are some exceedingly nicely done programs there (and very, very useful functions) which deal with random text generation too. Gordon Peterson http://www.computek.net/public/gep2/ Support the Anti-SPAM Amendment! Join at http://www.cauce.org/